现代车辆依靠通过控制器区域网络(CAN)巴士连接的电子控制装置(ECU)的车队进行关键的车辆控制。但是,随着汽车中高级连通性特征的扩展以及内部系统暴露的风险升高,罐头总线越来越容易受到侵入和注射攻击。普通的注射攻击破坏了CAN数据流的典型定时属性,基于规则的入侵检测系统(IDS)可以轻松检测它们。但是,高级攻击者可以将虚假数据注入到时间序列的感官数据(信号),同时通过CAN消息的模式/频率看起来无害。此类攻击可以绕过基于规则的ID或基于二进制有效载荷数据的任何基于异常的ID。为了使车辆强大地抵抗这种智能攻击,我们提出了CANSHIELD,这是一个基于信号的侵入式检测框架。 Canshield由三个模块组成:一个数据预处理模块,该模块在信号级别处理高维CAN数据流并使其适合深度学习模型;一个由多个深度自动编码器(AE)网络组成的数据分析仪模块,每个网络都从不同的时间角度分析时间序列数据;最后,使用集合方法来做出最终决定的攻击检测模块。对两个高保真信号的评估结果可以攻击数据集显示Canshield在检测高级入侵攻击方面的高精度和反应性。
translated by 谷歌翻译
我们表明,在固定级和对称的阳性半明确矩阵上,Riemannian梯度下降算法几乎可以肯定地逃脱了歧管边界上的一些虚假关键点。我们的结果是第一个部分克服低级基质歧管的不完整而不改变香草riemannian梯度下降算法的不完整性。虚假的关键点是一些缺陷的矩阵,仅捕获地面真理的特征成分的一部分。与经典的严格鞍点不同,它们表现出非常奇异的行为。我们表明,使用动力学低级别近似和重新升级的梯度流,可以将某些伪造的临界点转换为参数化域中的经典严格鞍点,从而导致所需的结果。提供数值实验以支持我们的理论发现。
translated by 谷歌翻译
State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly linearly in sequence length instead of quadratically, SSMs are still slower than Transformers due to poor hardware utilization. In this paper, we make progress on understanding the expressivity gap between SSMs and attention in language modeling, and on reducing the hardware barrier between SSMs and attention. First, we use synthetic language modeling tasks to understand the gap between SSMs and attention. We find that existing SSMs struggle with two capabilities: recalling earlier tokens in the sequence and comparing tokens across the sequence. To understand the impact on language modeling, we propose a new SSM layer, H3, that is explicitly designed for these abilities. H3 matches attention on the synthetic languages and comes within 0.4 PPL of Transformers on OpenWebText. Furthermore, a hybrid 125M-parameter H3-attention model that retains two attention layers surprisingly outperforms Transformers on OpenWebText by 1.0 PPL. Next, to improve the efficiency of training SSMs on modern hardware, we propose FlashConv. FlashConv uses a fused block FFT algorithm to improve efficiency on sequences up to 8K, and introduces a novel state passing algorithm that exploits the recurrent properties of SSMs to scale to longer sequences. FlashConv yields 2$\times$ speedup on the long-range arena benchmark and allows hybrid language models to generate text 1.6$\times$ faster than Transformers. Using FlashConv, we scale hybrid H3-attention language models up to 1.3B parameters on the Pile and find promising initial results, achieving lower perplexity than Transformers and outperforming Transformers in zero- and few-shot learning on a majority of tasks in the SuperGLUE benchmark.
translated by 谷歌翻译
Traditionally, data analysis and theory have been viewed as separate disciplines, each feeding into fundamentally different types of models. Modern deep learning technology is beginning to unify these two disciplines and will produce a new class of predictively powerful space weather models that combine the physical insights gained by data and theory. We call on NASA to invest in the research and infrastructure necessary for the heliophysics' community to take advantage of these advances.
translated by 谷歌翻译
The physics-informed neural operator (PINO) is a machine learning architecture that has shown promising empirical results for learning partial differential equations. PINO uses the Fourier neural operator (FNO) architecture to overcome the optimization challenges often faced by physics-informed neural networks. Since the convolution operator in PINO uses the Fourier series representation, its gradient can be computed exactly on the Fourier space. While Fourier series cannot represent nonperiodic functions, PINO and FNO still have the expressivity to learn nonperiodic problems with Fourier extension via padding. However, computing the Fourier extension in the physics-informed optimization requires solving an ill-conditioned system, resulting in inaccurate derivatives which prevent effective optimization. In this work, we present an architecture that leverages Fourier continuation (FC) to apply the exact gradient method to PINO for nonperiodic problems. This paper investigates three different ways that FC can be incorporated into PINO by testing their performance on a 1D blowup problem. Experiments show that FC-PINO outperforms padded PINO, improving equation loss by several orders of magnitude, and it can accurately capture the third order derivatives of nonsmooth solution functions.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Transformer-based models, capable of learning better global dependencies, have recently demonstrated exceptional representation learning capabilities in computer vision and medical image analysis. Transformer reformats the image into separate patches and realize global communication via the self-attention mechanism. However, positional information between patches is hard to preserve in such 1D sequences, and loss of it can lead to sub-optimal performance when dealing with large amounts of heterogeneous tissues of various sizes in 3D medical image segmentation. Additionally, current methods are not robust and efficient for heavy-duty medical segmentation tasks such as predicting a large number of tissue classes or modeling globally inter-connected tissues structures. Inspired by the nested hierarchical structures in vision transformer, we proposed a novel 3D medical image segmentation method (UNesT), employing a simplified and faster-converging transformer encoder design that achieves local communication among spatially adjacent patch sequences by aggregating them hierarchically. We extensively validate our method on multiple challenging datasets, consisting anatomies of 133 structures in brain, 14 organs in abdomen, 4 hierarchical components in kidney, and inter-connected kidney tumors). We show that UNesT consistently achieves state-of-the-art performance and evaluate its generalizability and data efficiency. Particularly, the model achieves whole brain segmentation task complete ROI with 133 tissue classes in single network, outperforms prior state-of-the-art method SLANT27 ensembled with 27 network tiles, our model performance increases the mean DSC score of the publicly available Colin and CANDI dataset from 0.7264 to 0.7444 and from 0.6968 to 0.7025, respectively.
translated by 谷歌翻译
ICECUBE是一种用于检测1 GEV和1 PEV之间大气和天体中微子的光学传感器的立方公斤阵列,该阵列已部署1.45 km至2.45 km的南极的冰盖表面以下1.45 km至2.45 km。来自ICE探测器的事件的分类和重建在ICeCube数据分析中起着核心作用。重建和分类事件是一个挑战,这是由于探测器的几何形状,不均匀的散射和冰中光的吸收,并且低于100 GEV的光,每个事件产生的信号光子数量相对较少。为了应对这一挑战,可以将ICECUBE事件表示为点云图形,并将图形神经网络(GNN)作为分类和重建方法。 GNN能够将中微子事件与宇宙射线背景区分开,对不同的中微子事件类型进行分类,并重建沉积的能量,方向和相互作用顶点。基于仿真,我们提供了1-100 GEV能量范围的比较与当前ICECUBE分析中使用的当前最新最大似然技术,包括已知系统不确定性的影响。对于中微子事件分类,与当前的IceCube方法相比,GNN以固定的假阳性速率(FPR)提高了信号效率的18%。另外,GNN在固定信号效率下将FPR的降低超过8(低于半百分比)。对于能源,方向和相互作用顶点的重建,与当前最大似然技术相比,分辨率平均提高了13%-20%。当在GPU上运行时,GNN能够以几乎是2.7 kHz的中位数ICECUBE触发速率的速率处理ICECUBE事件,这打开了在在线搜索瞬态事件中使用低能量中微子的可能性。
translated by 谷歌翻译
缺乏对深度学习系统的洞察力阻碍了他们的系统设计。在科学和工程学中,建模是一种用于了解内部过程不透明的复杂系统的方法。建模用更简单的代理代替复杂的系统,该系统更适合解释。从中汲取灵感,我们使用高斯流程为神经网络构建了一类代理模型。我们没有从神经网络的某些限制案例中得出内核,而是从经验上从神经网络的自然主义行为中学习了高斯过程的内核。我们首先通过两项案例研究评估我们的方法,灵感来自先前对神经网络行为的理论研究,在这些案例研究中,我们捕获了学习低频的神经网络偏好,并确定了深层神经网络中的病理行为。在进一步的实践案例研究中,我们使用学识渊博的内核来预测神经网络的泛化特性。
translated by 谷歌翻译
估计不确定性是进行HEP中科学测量的核心:如果没有估计其不确定性,测量是无用的。不确定性量化(UQ)的目的是与这个问题密不可分的:“我们如何在身体和统计上解释这些不确定性?”这个问题的答案不仅取决于我们要执行的计算任务,还取决于我们用于该任务的方法。对于HEP中的人工智能(AI)应用,在几个领域中,可解释的UQ方法至关重要,包括推理,仿真和控制/决策。这些领域中的每个领域都有一些方法,但尚未被证明像当前在物理学中使用的更传统的方法一样值得信赖(例如,非AI经常主义者和贝叶斯方法)。阐明上面的问题需要更多地了解AI系统的相互作用和不确定性量化。我们简要讨论每个领域的现有方法,并将其与HEP跨越的任务联系起来。然后,我们讨论了途径的建议,以开发必要的技术,以在接下来的十年中可靠地使用AI与UQ使用。
translated by 谷歌翻译